In-Memory Database Guide
Pyramid’s In Memory Database (IMDB) is an optional component of the Pyramid platform. Data can be written to IMDB as a part of a typical Pyramid data model and semantic models can be built against it to describe tables, their relationships and attributes for analysis.
The built in geospatial lookup database shipped with Pyramid is built into an IMDB database. While it is optional, the mapping tools in the product will not operate without it
Overview
The Pyramid in-memory database is a column store, compressed relational database (obviously) hosted in memory. Pyramid reads and writes data to it using standard ANSI SQL in the same way as any other relational database. Its primary function is for exceptional fast reading and table scanning - it is not designed to act as an transactional data store. If IMDB is used in a production oriented, mission critical solution, more attention may be needed in its configuration and scaling than the standard defaults.
Scaling Guide
IMDB is designed to offer an alternative to other in-memory column-store database technologies like Microsoft Tabular and SAP Hana. Since it houses customer data, which is idiosyncratic in design, size and formulation, it is hard to provide specific guidance on servers hosting the in-memory engine.
The in-memory server at least needs the requisite memory to host the analytic databases. Beyond that, its performance is heavily affected by the number of available CPU's. As such, separating it from the other Pyramid tiers on its own server is highly recommended so it does not compete with other processing jobs. Having very large data sets or high concurrency running through the IMDB can potentially block the performance of the other services. Therefore, the IMBD service should be allocated its own dedicated server and not be shared with other Pyramid services.
In-Memory Size Guidance
An exact sizing technique for sizing databases hosted in the In-memory engine is complex. In, general, Pyramid’s in-memory databases are like those produced by other column-store technologies like Microsoft SSAS Tabular and SAP Hana. The guides below are based on real-world examples and experience. But they could be quite different in each scenario.
- On disk the save database (in all these technologies) is compressed 7-10 times of the original relational data source size. The amount of compression on disk is driven by two factors:
- Data types - integer’s vs decimals vs dates vs text/string. Each data type has its own byte length and size. Integers are lowest through to strings which are heaviest. The heavier the data type, the more space it requires.
- Cardinality - the number of unique values for a given column. If there is a high amount of repetition, the compression is higher. If each value in a column is entirely unique, there will be far less compression.
- In memory the database expands its footprint. A rough guide is that it will occupy twice its compressed disk footprint. So, the loaded database will be compressed 3-5 times of its original data source size in memory.
Using many sample databases, Pyramid has found that a rough rule of thumb is that a 10 million row fact table may take 2Gb of memory.
In-memory performance is affected by these resources:
- CPU: CPU's are used to crank query results, and because there is no “read” time like on-disk databases, the CPU's are the main limiting factor in processing requests. The number of concurrent requests that can be handled, therefore, is a simple function of the number of cores at the engine’s disposal.
- Memory Quality: Although this far less impact than the speed and number of CPU's, faster memory will obviously load and process tasks quicker than older memory modules.
- Disk Speed: this has a very minor role for housing the saved databases. Load up speed and save speed however are affected by disk speed. For very large systems, this can impact the build and load up time considerably
Custom Configuration
Location
Pyramid IMDB can be configured using an initialization file. This file is named irdb.ini and can be found in the Pyramid installation directories. The default locations for the Pyramid installation directories are C:\Program Files for Windows and /opt for Linux. If the directories are not present at these locations, consult your administrators as to where the software was installed.
Within the Pyramid directory structure, irdb.ini can be found in the directory …\Pyramid\imdb for Windows and …/Pyramid/imdb for Linux.
Default Configuration Parameters
There are several entries in irdb.ini that are set by the installer and will appear something like this if all defaults are accepted:
[general]
port=12170
password=xxxxxxx
data_dir=C:\Program Files\Pyramid\repository\imdata
unload_timer=0
unload_temptables=10
nocache=false
The entries port, password and data_dir are required. All other parameters are optional.
Optional Configuration Parameters
All configuration parameters are currently placed in the [General] section of the irdb.ini file.
We do not recommend adjusting configurations on production deployments unless they have been thoroughly test first.
Parameter |
Description |
data_dir = <directory path> |
Directory that IMDB uses for persistent storage of databases. Default from installation for Windows is C:\Program Files\Pyramid\repository\imdata and for Linux is /opt/Pyramid/repository/imdata. |
encrypt = true | false |
Setting this to true, will make the server run in encrypted mode. |
encrypt_cert = <file name location> |
Specifies the file name and location of the server certificate for encryption. (PFX File) |
encrypt_password = <password> |
Specifies the encryption password for the certificte specified by encrypt_cert |
lazyload = true | false |
Setting this to true will enable the LazyLoad mode. This loads columns on demand and reduces memory consumption. |
lazyload_unload_column_timer = n |
Setting this to a number e.g. 15 will cause database columns to be unloaded after 15 minutes of inactivity, when lazy-load is set to true. |
logall = true | false |
Setting this to false will reduce the amount of data logged on a query. No timing, SQL or other information is logged. This helps performance when running large numbers of queries |
log_dir = <directory path> |
Lets you specify the directory where the IMDB server log file, irdb.log, is placed. |
maxcpus = n |
Specify an integer here to limit the maximum number of cores that the IMDB server uses to process a single query. When running on computers with large number of cores, say greater than 16, setting a value of 50% of the number of cores, can help performance. |
nocache = false | true |
Setting this to true, will stop the engine from caching sub-select inner queries. |
password = <password> |
Password to validate against when clients connect to IMDB. The default is pyramid. |
port = n |
Indicates the TCP/IP port number to connect to IMDB. The default set by the installer is 12170. |
query_log_size = n |
Setting this to a non zero number e.g. 10000 will cause the server to remember the last 10,000 queries for each database. |
querycache =true | false |
Set to true allows results of queries to be cached and reused as necessary (saving time). |
querycache_timeout = n |
Can be used to change the length of time the query cache is retained in minutes. |
skiptables = <tablename>| <expression>, <tablename>| <expression> … |
Specify a comma separate list of tables or table patterns not to load. Patterns use SQL like syntax. For example, skiptables=stage% would cause any table starting with stage not to load. Designed to help reduce memory consumption by specifying certain tables not to load. |
unload_temptables = n |
Specify an integer here that causes inactive temporary tables to be released automatically after that amount of time in minutes. The system default is 30 minutes. |
unload_timer = n |
Specify an integer here that causes inactive databases to be unloaded after that amount of time in minutes. This helps reduce memory consumption, when you have large numbers of databases, of which only a limited percentage are being used concurrently. Default is 0, which causes inactive databases not to be unloaded |
windowsauth = true | false |
Setting this to true, makes the server run in Windows Authentication mode. |
windowsauth_role = <Windows role name> |
When windowsauth is set, this specifies a role that is allowed to connect. Default value is Administrators |